Two-Stage Data Mining for Big Statistical Micro Data
نویسندگان
چکیده
We apply a two-stage data mining strategy to handle and analyze big statistical micro data sets. The first stage consists of smart aggregation of such micro data, and the second data continues to analyze and visualize the smartly aggregated data, further. The smart aggregation here requires three steps. One is to decide and to create the appropriate aggregates themselves, called also ‘concepts.’ Second, the characteristics for the concepts need to be implemented. The Symbolic Data Analysis (SDA) approach offers good tools for this. In this paper, our symbolic variables are of the two kinds: frequencies of categorical variables, and intervals of continuous variables. The third step is to operationalize the first two steps by creating the new data set. This operation is performed by the SYR software for SDA from Syrokko Company. We present our methodology with empirical data from the fifth round of the European Social Survey.
منابع مشابه
Design and Test of the Real-time Text mining dashboard for Twitter
One of today's major research trends in the field of information systems is the discovery of implicit knowledge hidden in dataset that is currently being produced at high speed, large volumes and with a wide variety of formats. Data with such features is called big data. Extracting, processing, and visualizing the huge amount of data, today has become one of the concerns of data science scholar...
متن کاملSAMOA: a platform for mining big data streams
Social media and user generated content are causing an ever growing data deluge. The rate at which we produce data is growing steadily, thus creating larger and larger streams of continuously evolving data. Online news, micro-blogs, search queries are just a few examples of these continuous streams of user activities. The value of these streams relies in their freshness and relatedness to ongoi...
متن کاملMulti-Objective Model for Fair Pricing of Electricity Using the Parameters from the Iran Electricity Market Big Data Analysis
Assessment of the electricity market shows that, electricity market data can be considered "big data". this data has been analyzed by both conventional and modern data mining methods. The predicted variables of supply and demand are considered to be the input of a defined multi-objective for predicting electricity price, which is the result of the defined model. This shows the advantage of appl...
متن کاملData Partitioning View of Mining Big Data
There are two main approximations of mining big data in memory. One is to partition a big dataset to several subsets, so as to mine each subset in memory. By this way, global patterns can be obtained by synthesizing all local patterns discovered from these subsets. Another is the statistical sampling method. This indicates that data partitioning should be an important strategy for mining big da...
متن کاملFast Adaptive Real-Time Classification for Data Streams with Concept Drift
An important application of Big Data Analytics is the realtime analysis of streaming data. Streaming data imposes unique challenges to data mining algorithms, such as concept drifts, the need to analyse the data on the fly due to unbounded data streams and scalable algorithms due to potentially high throughput of data. Real-time classification algorithms that are adaptive to concept drifts and ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013